An empirical study on software defect prediction with a simplified metric set

نویسندگان

  • Peng He
  • Bing Li
  • Xiao Liu
  • Jun Chen
  • Yutao Ma
چکیده

Context: Software defect prediction plays a crucial role in estimating the most defect-prone components of software, and a large number of studies have pursued improving prediction accuracy within a project or across projects. However, the rules for making an appropriate decision between withinand cross-project defect prediction when available historical data are insufficient remain unclear. Objective: The objective of this work is to validate the feasibility of the predictor built with a simplified metric set for software defect prediction in different scenarios, and to investigate practical guidelines for the choice of training data, classifier and metric subset of a given project. Method: First, based on six typical classifiers, we constructed three types of predictors using the size of software metric set in three scenarios. Then, we validated the acceptable performance of the predictor based on Top-k metrics in terms of statistical methods. Finally, we attempted to minimize the Top-k metric subset by removing redundant metrics, and we tested the stability of such a minimum metric subset with one-way ANOVA tests. Results: The study has been conducted on 34 releases of 10 open-source projects available at the PROMISE repository. The findings indicate that the predictors built with either Top-k metrics or the minimum metric subset can provide an acceptable result compared with benchmark predictors. The guideline for choosing a suitable simplified metric set in different scenarios is presented in Table 10. Conclusion: The experimental results indicate that (1) the choice of training data should depend on the specific requirement of prediction accuracy; (2) the predictor built with a simplified metric set works well and is very useful in case limited resources are supplied; (3) simple classifiers (e.g., Naı̈ve Bayes) also tend to perform well when using a simplified metric set for defect prediction; (4) in several cases, the minimum metric subset can be identified to facilitate the procedure of general defect prediction with acceptable loss of prediction precision in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplification of Training Data for Cross-Project Defect Prediction

Cross-project defect prediction (CPDP) plays an important role in estimating the most likely defect-prone software components, especially for new or inactive projects. To the best of our knowledge, few prior studies provide explicit guidelines on how to select suitable training data of quality from a large number of public software repositories. In this paper, we have proposed a training data s...

متن کامل

Comparing the Effectiveness of Machine Learning Algorithms for Defect Prediction

Software repositories with defect logs are main resource for defect prediction. In recent years, researchers have used the vast amount of data that is contained by software repositories to predict the location of defect in the code that caused problem. In this paper machine learning approach is used for predicting the modules with defect for embedded data set. Public datasets from the promise r...

متن کامل

Defect prediction with bad smells in code

Streszczenie Background. Defect prediction in software can be highly beneficial for development projects, when prediction is highly effective and defect-prone areas are predicted correctly. One of the key elements to gain effective software defect prediction is proper selection of metrics used for dataset preparation. Objective. The purpose of this research is to verify, whether code smells met...

متن کامل

Towards Cross-Project Defect Prediction with Imbalanced Feature Sets

Cross-project defect prediction (CPDP) has been deemed as an emerging technology of software quality assurance, especially in new or inactive projects, and a few improved methods have been proposed to support better defect prediction. However, the regular CPDP always assumes that the features of training and test data are all identical. Hence, very little is known about whether the method for C...

متن کامل

Forecasting Field Defect Rates Using a Combined Time-based and Metric–based Approach a Case Study of OpenBSD

Open source software systems are critical infrastructure for many applications; however, little has been precisely measured about their quality. Forecasting the field defect-occurrence rate over the entire lifespan of a release before deployment for open source software systems may enable informed decision-making. In this paper, we present an empirical case study of ten releases of OpenBSD. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Information & Software Technology

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2015